'\" t
.TH samtools-fasta 1 "14 July 2025" "samtools-1.22.1" "Bioinformatics tools"
.SH NAME
samtools-fasta, samtools-fastq \- converts a SAM/BAM/CRAM file to FASTA or FASTQ
.\"
.\" Copyright (C) 2008-2011, 2013-2020, 2023-2024 Genome Research Ltd.
.\" Portions copyright (C) 2010, 2011 Broad Institute.
.\"
.\" Author: Heng Li <lh3@sanger.ac.uk>
.\" Author: Joshua C. Randall <jcrandall@alum.mit.edu>
.\"
.\" Permission is hereby granted, free of charge, to any person obtaining a
.\" copy of this software and associated documentation files (the "Software"),
.\" to deal in the Software without restriction, including without limitation
.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
.\" and/or sell copies of the Software, and to permit persons to whom the
.\" Software is furnished to do so, subject to the following conditions:
.\"
.\" The above copyright notice and this permission notice shall be included in
.\" all copies or substantial portions of the Software.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
.\" THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
.\" DEALINGS IN THE SOFTWARE.
.
.\" For code blocks and examples (cf groff's Ultrix-specific man macros)
.de EX

.  in +\\$1
.  nf
.  ft CR
..
.de EE
.  ft
.  fi
.  in

..
.
.SH SYNOPSIS
.PP
samtools fastq
.RI [ options ]
.I in.bam
.br
samtools fasta
.RI [ options ]
.I in.bam

.SH DESCRIPTION
.PP
Converts a BAM or CRAM into either FASTQ or FASTA format depending on the
command invoked. The files will be automatically compressed if the
file names have a .gz, .bgz, or .bgzf extension.

Note this command is attempting to reverse the alignment process, so
if the aligner took a single input FASTQ and produced multiple SAM
records via supplementary and/or secondary alignments, then converting
back to FASTQ again should produce the original single FASTA / FASTQ
record.  By default it will not attempt to records for supplementary
and secondary alignments, but see the \fB-F\fR option for more details.

If the input contains read-pairs which are to be interleaved or
written to separate files in the same order, then the input should
be first collated by name.
Use
.B samtools collate
or
.B samtools sort -n
to ensure this.

For each different QNAME, the input records are categorised according to
the state of the READ1 and READ2 flag bits.
The three categories used are:

1 : Only READ1 is set.

2 : Only READ2 is set.

0 : Either both READ1 and READ2 are set; or neither is set.

The exact meaning of these categories depends on the sequencing technology
used.
It is expected that ordinary single and paired-end sequencing reads will be
in categories 1 and 2 (in the case of paired-end reads, one read of the pair
will be in category 1, the other in category 2).
Category 0 is essentially a \*(lqcatch-all\*(rq for reads that do not
fit into a simple paired-end sequencing model.

For each category only one sequence will be written for a given QNAME.
If more than one record is available for a given QNAME and category,
the first in input file order that has quality values will be used.
If none of the candidate records has quality values, then the first in
input file order will be used instead.

Sequences will be written to standard output unless one of the
.BR -1 ", " -2 ", " -o ", or " -0
options is used, in which case sequences for that category will be written to
the specified file.
The same filename may be specified with multiple options, in which case the
sequences will be multiplexed in order of occurrence.

If a singleton file is specified using the
.B -s
option then only paired sequences will be output for categories 1 and 2;
paired meaning that for a given QNAME there are sequences for both
category 1
.B and
2.
If there is a sequence for only one of categories 1 or 2 then it will be
diverted into the specified singletons file.
This can be used to prepare fastq files for programs that cannot handle
a mixture of paired and singleton reads.

The
.B -s
option only affects category 1 and 2 records.
The output for category 0 will be the same irrespective of the use of this
option.

The sequence generated will be for the entire sequence recorded in the
SAM record (and quality if appropriate).  This means if it has
soft-clipped CIGAR records then the soft-clipped data will be in the
output FASTA/FASTQ.  Hard-clipped data is, by definition, absent from
the SAM record and hence will be absent in any FASTA/FASTQ produced.

The filter options order of precedence is -d/-D, -f, -F, --rf and -G.

.SH OPTIONS
.TP 8
.B -n
By default, either '/1' or '/2' is added to the end of read names
where the corresponding READ1 or READ2 FLAG bit is set.
Using
.B -n
causes read names to be left as they are.
.TP 8
.B -N
Always add either '/1' or '/2' to the end of read names
even when put into different files.
.TP 8
.B -O
Use quality values from OQ tags in preference to standard quality string
if available.
.TP 8
.B -s FILE
Write singleton reads to FILE.
.TP 8
.B -t
Copy RG, BC and QT tags to the FASTQ header line, if they exist.
.TP 8
.B -T TAGLIST
Specify a comma-separated list of tags to copy to the FASTQ header line, if
they exist.
\fBTAGLIST\fR can be blank or \fB*\fR to indicate all tags should be copied to
the output.
If using \fB*\fR, be careful to quote it to avoid unwanted shell expansion.
.TP 8
.B -1 FILE
Write reads with the READ1 FLAG set (and READ2 not set) to FILE instead of
outputting them.
If the
.B -s
option is used, only paired reads will be written to this file.
.TP 8
.B -2 FILE
Write reads with the READ2 FLAG set (and READ1 not set) to FILE instead of
outputting them.
If the
.B -s
option is used, only paired reads will be written to this file.
.TP 8
.B -o FILE
Write reads with either READ1 FLAG or READ2 flag set to FILE instead
of outputting them to stdout.  This is equivalent to \fB-1\fR FILE
\fB-2\fR FILE.
.TP 8
.B -0 FILE
Write reads where the READ1 and READ2 FLAG bits set are either both set
or both unset to FILE instead of outputting them.
.TP 8
.BI "-f " INT
Only output alignments with all bits set in
.I INT
present in the FLAG field.
.I INT
can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/)
or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].
.TP 8
.BI "-F " INT ", ", --excl-flags " INT ", --exclude-flags " INT
Do not output alignments with any bits set in
.I INT
present in the FLAG field.
.I INT
can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/)
or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0x900].
This defaults to 0x900 representing filtering of secondary and
supplementary alignments.
.TP 8
.BI "--rf " INT " , --incl-flags " INT ", --include-flags " INT
Only output alignments with any bits set in
.I INT
present in the FLAG field.
.I INT
can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/),
in octal by beginning with `0' (i.e. /^0[0-7]+/), as a decimal number
not beginning with '0' or as a comma-separated list of flag names [0].
.TP
.BI "-G " INT
Only EXCLUDE reads with all of the bits set in
.I INT
present in the FLAG field.
.I INT
can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/)
or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].
.TP 8
.BI "-d " TAG [: VAL ]
Only output alignments containing an auxiliary tag matching both
\fITAG\fR and \fIVAL\fR.  If \fIVAL\fR is omitted then any value is
accepted.  The tag types supported are i, f, Z, A and H.  "B" arrays
are not supported.  This is comparable to the method used in
\fBsamtools view -d\fR.  The option may be specified multiple times
and is equivalent to using the \fB-D\fR option.
.TP 8
.BI "-D " TAG:FILE
Only output alignments containing an auxiliary tag matching \fITAG\fR
and having a value listed in \fIFILE\fR.  The format of the file is
one line per value.  This is equivalent to specifying \fB-d\fR
multiple times.
.TP 8
.B -i
add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
.TP 8
.B -c [0..9]
set compression level when writing gz or bgzf fastq files.
.TP 8
.B --i1 FILE
write first index reads to FILE
.TP 8
.B --i2 FILE
write second index reads to FILE
.TP 8
.B --barcode-tag TAG
aux tag to find index reads in [default: BC]
.TP 8
.B --quality-tag TAG
aux tag to find index quality in [default: QT]
.TP
.BI "-@, --threads " INT
Number of input/output compression threads to use in addition to main thread [0].
.TP 8
.B --index-format STR
string to describe how to parse the barcode and quality tags. For example:

.RS
.TP 8
.B i14i8
the first 14 characters are index 1, the next 8 characters are index 2
.TP 8
.B n8i14
ignore the first 8 characters, and use the next 14 characters for index 1

If the tag contains a separator, then the numeric part can be replaced with '*' to
mean 'read until the separator or end of tag', for example:
.TP 8
.B n*i*
ignore the left part of the tag until the separator, then use the second part
.RE

.SH EXAMPLES
Starting from a coordinate sorted file, output paired reads to
separate files, discarding singletons, supplementary and secondary reads.
The resulting files can be used with, for example, the
.B bwa
aligner.
.EX 4
samtools collate -u -O in_pos.bam | \\
samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -n
.EE

Starting with a name collated file, output paired and singleton reads
in a single file, discarding supplementary and secondary reads.
To get all of the reads in a single file, it is necessary to redirect the
output of samtools fastq.
The output file is suitable for use with
.B bwa mem -p
which understands interleaved files containing a mixture of paired and
singleton reads.
.EX 4
samtools fastq -0 /dev/null in_name.bam > all_reads.fq
.EE

Output paired reads in a single file, discarding supplementary and
secondary reads.
Save any singletons in a separate file.
Append /1 and /2 to read names.
This format is suitable for use by
.B NextGenMap
when using its
.BR -p " and " -q " options."
With this aligner, paired reads must be mapped separately to the singletons.
.EX 4
samtools fastq -0 /dev/null -s single.fq -N in_name.bam > paired.fq
.EE

.SH BUGS
.IP o 2
The way of specifying output files is far too complicated and easy to get wrong.

.SH AUTHOR
.PP
Written by Heng Li, with modifications by Martin Pollard and Jennifer Liddle,
all from the Sanger Institute.

.SH SEE ALSO
.IR samtools (1),
.IR samtools-faidx (1),
.IR samtools-fqidx (1)
.IR samtools-import (1)
.PP
Samtools website: <http://www.htslib.org/>
